25 research outputs found
FinGPT: Democratizing Internet-scale Data for Financial Large Language Models
Large language models (LLMs) have demonstrated remarkable proficiency in
understanding and generating human-like texts, which may potentially
revolutionize the finance industry. However, existing LLMs often fall short in
the financial field, which is mainly attributed to the disparities between
general text data and financial text data. Unfortunately, there is only a
limited number of financial text datasets available (quite small size), and
BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the
training logs were released). In light of this, we aim to democratize
Internet-scale financial data for LLMs, which is an open challenge due to
diverse data sources, low signal-to-noise ratio, and high time-validity. To
address the challenges, we introduce an open-sourced and data-centric
framework, \textit{Financial Generative Pre-trained Transformer (FinGPT)}, that
automates the collection and curation of real-time financial data from >34
diverse sources on the Internet, providing researchers and practitioners with
accessible and transparent resources to develop their FinLLMs. Additionally, we
propose a simple yet effective strategy for fine-tuning FinLLM using the
inherent feedback from the market, dubbed Reinforcement Learning with Stock
Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that
enables users to customize their own FinLLMs from open-source general-purpose
LLMs at a low cost. Finally, we showcase several FinGPT applications, including
robo-advisor, sentiment analysis for algorithmic trading, and low-code
development. FinGPT aims to democratize FinLLMs, stimulate innovation, and
unlock new opportunities in open finance. The codes are available at
https://github.com/AI4Finance-Foundation/FinGPT and
https://github.com/AI4Finance-Foundation/FinNLPComment: 43 pages, 9 tables, and 3 figure
Interactive System-wise Anomaly Detection
Anomaly detection, where data instances are discovered containing feature
patterns different from the majority, plays a fundamental role in various
applications. However, it is challenging for existing methods to handle the
scenarios where the instances are systems whose characteristics are not readily
observed as data. Appropriate interactions are needed to interact with the
systems and identify those with abnormal responses. Detecting system-wise
anomalies is a challenging task due to several reasons including: how to
formally define the system-wise anomaly detection problem; how to find the
effective activation signal for interacting with systems to progressively
collect the data and learn the detector; how to guarantee stable training in
such a non-stationary scenario with real-time interactions? To address the
challenges, we propose InterSAD (Interactive System-wise Anomaly Detection).
Specifically, first, we adopt Markov decision process to model the interactive
systems, and define anomalous systems as anomalous transition and anomalous
reward systems. Then, we develop an end-to-end approach which includes an
encoder-decoder module that learns system embeddings, and a policy network to
generate effective activation for separating embeddings of normal and anomaly
systems. Finally, we design a training method to stabilize the learning
process, which includes a replay buffer to store historical interaction data
and allow them to be re-sampled. Experiments on two benchmark environments,
including identifying the anomalous robotic systems and detecting user data
poisoning in recommendation models, demonstrate the superiority of InterSAD
compared with state-of-the-art baselines methods
PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning
Outlier detection is an important task for various data mining applications.
Current outlier detection techniques are often manually designed for specific
domains, requiring large human efforts of database setup, algorithm selection,
and hyper-parameter tuning. To fill this gap, we present PyODDS, an automated
end-to-end Python system for Outlier Detection with Database Support, which
automatically optimizes an outlier detection pipeline for a new data source at
hand. Specifically, we define the search space in the outlier detection
pipeline, and produce a search strategy within the given search space. PyODDS
enables end-to-end executions based on an Apache Spark backend server and a
light-weight database. It also provides unified interfaces and visualizations
for users with or without data science or machine learning background. In
particular, we demonstrate PyODDS on several real-world datasets, with
quantification analysis and visualization results.Comment: In Companion Proceedings of the Web Conference 2020 (WWW 20